Recording Provenance on Probabilistic Databases

نویسندگان

  • Ming Gao
  • Xiangnan He
  • Cheqing Jin
  • Xiaoling Wang
  • Aoying Zhou
چکیده

Tracking data provenance (or lineage) has become increasingly important in many large-scale applications. Till now, a few methods have been proposed to record data provenance. However, most of them mainly focus on deterministic databases except Trio style lineage that aims at probabilistic databases. Processing provenance upon probabilistic database is even challenging because of the exponential growth of possible world instances and dependence of intermediate tuples. In this paper, we propose a model, named PWP-tree, to record data provenance upon probabilistic database. We further present some data provenance model upon probabilistic databases, including Trio style lineage, can translate them into our model for uncertainty propagation. Compared with Trio style lineage, our model is independent of intermediate results, but the results of probability evaluation is identical to the possible worlds semantics. Detailed experimental results show the effectiveness, efficiency and scalability of our proposed model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provenance and Probabilities in Relational Databases: From Theory to Practice

We review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation...

متن کامل

Chapter 2 MODELS FOR INCOMPLETE AND PROBABILISTIC INFORMATION

We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and...

متن کامل

Deciding How to Store Provenance

Provenance of a file is metadata pertaining to the history of the file. Provenance, unlike normal metadata stored in file systems, is retrieved primarily by running queries. This implies that provenance has to be indexed and should have a query interface. We believe that databases are the most appropriate place to store provenance as they provide both indexing and query capabilities. The goal o...

متن کامل

Provenance in Databases (Tutorial Outline)

The provenance of data has recently been recognized as central to the trust one places in data. It is also important to annotation, to data integration and to probabilistic databases. Three workshops have been held on the topic, and it has been the focus of several research projects and prototype systems. This tutorial will attempt to provide an overview of research in provenance in databases w...

متن کامل

Circuits for Datalog Provenance

The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009